{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "whjsJasuhstV"
   },
   "source": [
    "<a href=\"https://colab.research.google.com/github/jeffheaton/app_generative_ai/blob/main/t81_559_class_05_4_custom_parsers.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "euOZxlIMhstX"
   },
   "source": [
    "# T81-559: Applications of Generative Artificial Intelligence\n",
    "**Module 5: LangChain: Data Extraction**\n",
    "* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), McKelvey School of Engineering, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)\n",
    "* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "d4Yov72PhstY"
   },
   "source": [
    "# Module 5 Material\n",
    "\n",
    "* Part 5.1: Structured Output Parser [[Video]](https://www.youtube.com/watch?v=62CSR141VRE) [[Notebook]](t81_559_class_05_1_langchain_data.ipynb)\n",
    "* Part 5.2: Other Parsers (CSV, JSON, Pandas, Datetime) [[Video]](https://www.youtube.com/watch?v=VXm8gPzU3qc) [[Notebook]](t81_559_class_05_2_parsers.ipynb)\n",
    "* Part 5.3: Pydantic parser [[Video]](https://www.youtube.com/watch?v=dc4fn-W60hg) [[Notebook]](t81_559_class_05_3_pydantic.ipynb)\n",
    "* **Part 5.4: Custom Output Parser** [[Video]](https://www.youtube.com/watch?v=jBpkAblQC_U) [[Notebook]](t81_559_class_05_4_custom_parsers.ipynb)\n",
    "* Part 5.5: Output-Fixing Parser [[Video]](https://www.youtube.com/watch?v=_txWiLjf4bo) [[Notebook]](t81_559_class_05_5_output_fixing_parsers.ipynb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "AcAUP0c3hstY"
   },
   "source": [
    "# Google CoLab Instructions\n",
    "\n",
    "The following code ensures that Google CoLab is running and maps Google Drive if needed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "xsI496h5hstZ",
    "outputId": "dadfd0cc-594d-4dd8-c947-9db8f9b7dd7e"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Note: using Google CoLab\n",
      "Requirement already satisfied: langchain in /usr/local/lib/python3.10/dist-packages (0.2.0)\n",
      "Requirement already satisfied: langchain_openai in /usr/local/lib/python3.10/dist-packages (0.1.7)\n",
      "Requirement already satisfied: PyYAML>=5.3 in /usr/local/lib/python3.10/dist-packages (from langchain) (6.0.1)\n",
      "Requirement already satisfied: SQLAlchemy<3,>=1.4 in /usr/local/lib/python3.10/dist-packages (from langchain) (2.0.30)\n",
      "Requirement already satisfied: aiohttp<4.0.0,>=3.8.3 in /usr/local/lib/python3.10/dist-packages (from langchain) (3.9.5)\n",
      "Requirement already satisfied: async-timeout<5.0.0,>=4.0.0 in /usr/local/lib/python3.10/dist-packages (from langchain) (4.0.3)\n",
      "Requirement already satisfied: dataclasses-json<0.7,>=0.5.7 in /usr/local/lib/python3.10/dist-packages (from langchain) (0.6.6)\n",
      "Requirement already satisfied: langchain-core<0.3.0,>=0.2.0 in /usr/local/lib/python3.10/dist-packages (from langchain) (0.2.1)\n",
      "Requirement already satisfied: langchain-text-splitters<0.3.0,>=0.2.0 in /usr/local/lib/python3.10/dist-packages (from langchain) (0.2.0)\n",
      "Requirement already satisfied: langsmith<0.2.0,>=0.1.17 in /usr/local/lib/python3.10/dist-packages (from langchain) (0.1.62)\n",
      "Requirement already satisfied: numpy<2,>=1 in /usr/local/lib/python3.10/dist-packages (from langchain) (1.25.2)\n",
      "Requirement already satisfied: pydantic<3,>=1 in /usr/local/lib/python3.10/dist-packages (from langchain) (2.7.1)\n",
      "Requirement already satisfied: requests<3,>=2 in /usr/local/lib/python3.10/dist-packages (from langchain) (2.31.0)\n",
      "Requirement already satisfied: tenacity<9.0.0,>=8.1.0 in /usr/local/lib/python3.10/dist-packages (from langchain) (8.3.0)\n",
      "Requirement already satisfied: openai<2.0.0,>=1.24.0 in /usr/local/lib/python3.10/dist-packages (from langchain_openai) (1.30.1)\n",
      "Requirement already satisfied: tiktoken<1,>=0.7 in /usr/local/lib/python3.10/dist-packages (from langchain_openai) (0.7.0)\n",
      "Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (1.3.1)\n",
      "Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (23.2.0)\n",
      "Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (1.4.1)\n",
      "Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (6.0.5)\n",
      "Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (1.9.4)\n",
      "Requirement already satisfied: marshmallow<4.0.0,>=3.18.0 in /usr/local/lib/python3.10/dist-packages (from dataclasses-json<0.7,>=0.5.7->langchain) (3.21.2)\n",
      "Requirement already satisfied: typing-inspect<1,>=0.4.0 in /usr/local/lib/python3.10/dist-packages (from dataclasses-json<0.7,>=0.5.7->langchain) (0.9.0)\n",
      "Requirement already satisfied: jsonpatch<2.0,>=1.33 in /usr/local/lib/python3.10/dist-packages (from langchain-core<0.3.0,>=0.2.0->langchain) (1.33)\n",
      "Requirement already satisfied: packaging<24.0,>=23.2 in /usr/local/lib/python3.10/dist-packages (from langchain-core<0.3.0,>=0.2.0->langchain) (23.2)\n",
      "Requirement already satisfied: orjson<4.0.0,>=3.9.14 in /usr/local/lib/python3.10/dist-packages (from langsmith<0.2.0,>=0.1.17->langchain) (3.10.3)\n",
      "Requirement already satisfied: anyio<5,>=3.5.0 in /usr/local/lib/python3.10/dist-packages (from openai<2.0.0,>=1.24.0->langchain_openai) (3.7.1)\n",
      "Requirement already satisfied: distro<2,>=1.7.0 in /usr/lib/python3/dist-packages (from openai<2.0.0,>=1.24.0->langchain_openai) (1.7.0)\n",
      "Requirement already satisfied: httpx<1,>=0.23.0 in /usr/local/lib/python3.10/dist-packages (from openai<2.0.0,>=1.24.0->langchain_openai) (0.27.0)\n",
      "Requirement already satisfied: sniffio in /usr/local/lib/python3.10/dist-packages (from openai<2.0.0,>=1.24.0->langchain_openai) (1.3.1)\n",
      "Requirement already satisfied: tqdm>4 in /usr/local/lib/python3.10/dist-packages (from openai<2.0.0,>=1.24.0->langchain_openai) (4.66.4)\n",
      "Requirement already satisfied: typing-extensions<5,>=4.7 in /usr/local/lib/python3.10/dist-packages (from openai<2.0.0,>=1.24.0->langchain_openai) (4.11.0)\n",
      "Requirement already satisfied: annotated-types>=0.4.0 in /usr/local/lib/python3.10/dist-packages (from pydantic<3,>=1->langchain) (0.6.0)\n",
      "Requirement already satisfied: pydantic-core==2.18.2 in /usr/local/lib/python3.10/dist-packages (from pydantic<3,>=1->langchain) (2.18.2)\n",
      "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2->langchain) (3.3.2)\n",
      "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2->langchain) (3.7)\n",
      "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2->langchain) (2.0.7)\n",
      "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2->langchain) (2024.2.2)\n",
      "Requirement already satisfied: greenlet!=0.4.17 in /usr/local/lib/python3.10/dist-packages (from SQLAlchemy<3,>=1.4->langchain) (3.0.3)\n",
      "Requirement already satisfied: regex>=2022.1.18 in /usr/local/lib/python3.10/dist-packages (from tiktoken<1,>=0.7->langchain_openai) (2023.12.25)\n",
      "Requirement already satisfied: exceptiongroup in /usr/local/lib/python3.10/dist-packages (from anyio<5,>=3.5.0->openai<2.0.0,>=1.24.0->langchain_openai) (1.2.1)\n",
      "Requirement already satisfied: httpcore==1.* in /usr/local/lib/python3.10/dist-packages (from httpx<1,>=0.23.0->openai<2.0.0,>=1.24.0->langchain_openai) (1.0.5)\n",
      "Requirement already satisfied: h11<0.15,>=0.13 in /usr/local/lib/python3.10/dist-packages (from httpcore==1.*->httpx<1,>=0.23.0->openai<2.0.0,>=1.24.0->langchain_openai) (0.14.0)\n",
      "Requirement already satisfied: jsonpointer>=1.9 in /usr/local/lib/python3.10/dist-packages (from jsonpatch<2.0,>=1.33->langchain-core<0.3.0,>=0.2.0->langchain) (2.4)\n",
      "Requirement already satisfied: mypy-extensions>=0.3.0 in /usr/local/lib/python3.10/dist-packages (from typing-inspect<1,>=0.4.0->dataclasses-json<0.7,>=0.5.7->langchain) (1.0.0)\n"
     ]
    }
   ],
   "source": [
    "import os\n",
    "\n",
    "try:\n",
    "    from google.colab import drive, userdata\n",
    "    COLAB = True\n",
    "    print(\"Note: using Google CoLab\")\n",
    "except:\n",
    "    print(\"Note: not using Google CoLab\")\n",
    "    COLAB = False\n",
    "\n",
    "# OpenAI Secrets\n",
    "if COLAB:\n",
    "    os.environ[\"OPENAI_API_KEY\"] = userdata.get('OPENAI_API_KEY')\n",
    "\n",
    "# Install needed libraries in CoLab\n",
    "if COLAB:\n",
    "    !pip install langchain langchain_openai"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "pC9A-LaYhsta"
   },
   "source": [
    "# 5.4: Custom Output Parsers\n",
    "\n",
    "In certain scenarios, you might want to create a custom parser to format the model output uniquely.\n",
    "\n",
    "There are two ways to create a custom parser:\n",
    "\n",
    "* Using **RunnableLambda** or **RunnableGenerator** in LCEL - This is the recommended approach for most cases.\n",
    "* Inheriting from one of the base classes for output parsing - This is the more challenging method.\n",
    "\n",
    "The differences between these approaches are mostly superficial, primarily involving which callbacks are triggered (e.g., on_chain_start vs. on_parser_start) and how a runnable lambda vs. a parser is visualized in a tracing platform like LangSmith.\n",
    "\n",
    "I suggest using runnable lambdas and runnable generators for parsing.\n",
    "\n",
    "The following code creates a basic LLM model to use.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "MMgvbZVmxgdv"
   },
   "outputs": [],
   "source": [
    "from langchain_openai import ChatOpenAI\n",
    "\n",
    "MODEL = 'gpt-4o-mini'\n",
    "TEMPERATURE = 0.0\n",
    "\n",
    "# Initialize the OpenAI LLM with your API key\n",
    "llm = ChatOpenAI(\n",
    "    model=MODEL,\n",
    "    temperature=TEMPERATURE,\n",
    "    n=1\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "mDH-sL9WNP4v"
   },
   "source": [
    "In this section, we will create a simple parser that inverts the case of the model's output.\n",
    "\n",
    "For example, if the model outputs \"Hello World,\" the parser will transform it to \"hELLO wORLD.\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 35
    },
    "id": "uC_-RKjzwxtH",
    "outputId": "3a3c02a1-85b0-4c01-a819-c583c407d066"
   },
   "outputs": [
    {
     "data": {
      "application/vnd.google.colaboratory.intrinsic+json": {
       "type": "string"
      },
      "text/plain": [
       "'hELLO! hOW CAN i ASSIST YOU TODAY?'"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from typing import Iterable\n",
    "\n",
    "from langchain_core.messages import AIMessage, AIMessageChunk\n",
    "\n",
    "def parse(ai_message: AIMessage) -> str:\n",
    "    \"\"\"Parse the AI message.\"\"\"\n",
    "    return ai_message.content.swapcase()\n",
    "\n",
    "\n",
    "chain = llm | parse\n",
    "chain.invoke(\"hello\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "iC5GdyhEy2oh"
   },
   "source": [
    "## Inherting from Parsing Base Classes\n",
    "\n",
    "Another way to implement a parser is by inheriting from BaseOutputParser, BaseGenerationOutputParser, or another base parser depending on your needs.\n",
    "\n",
    "We generally do not recommend this approach for most use cases, as it requires more code without offering significant benefits.\n",
    "\n",
    "The simplest type of output parser extends the BaseOutputParser class and must implement the following methods:\n",
    "\n",
    "* **parse**: Takes the string output from the model and parses it.\n",
    "* **(optional) _type**: Identifies the name of the parser.\n",
    "When the output from the chat model or LLM is malformed, the parser can throw an OutputParserException to indicate that parsing failed due to bad input. Using this exception allows code utilizing the parser to handle exceptions consistently.\n",
    "\n",
    "Since BaseOutputParser implements the Runnable interface, any custom parser you create this way will become a valid LangChain Runnable, benefiting from automatic async support, batch interface, logging support, and more.\n",
    "\n",
    "Here's a simple parser that can parse a string representation of a boolean (e.g., YES or NO) and convert it into the corresponding boolean type."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "gKS6XvUKwxhz"
   },
   "outputs": [],
   "source": [
    "from langchain_core.exceptions import OutputParserException\n",
    "from langchain_core.output_parsers import BaseOutputParser\n",
    "\n",
    "\n",
    "class BooleanOutputParser(BaseOutputParser[bool]):\n",
    "    \"\"\"Custom parser to interpret 'YES'/'NO' strings as boolean values.\"\"\"\n",
    "\n",
    "    true_val: str = \"YES\"\n",
    "    false_val: str = \"NO\"\n",
    "\n",
    "    def parse(self, text: str) -> bool:\n",
    "        \"\"\"\n",
    "        Parse the input text and return a boolean value.\n",
    "\n",
    "        Args:\n",
    "            text (str): The input text to parse.\n",
    "\n",
    "        Returns:\n",
    "            bool: True if text matches true_val, False if it matches false_val.\n",
    "\n",
    "        Raises:\n",
    "            OutputParserException: If the text does not match true_val or false_val.\n",
    "        \"\"\"\n",
    "        cleaned_text = text.strip().upper()\n",
    "        if cleaned_text not in (self.true_val.upper(), self.false_val.upper()):\n",
    "            raise OutputParserException(\n",
    "                f\"BooleanOutputParser expected output value to be either \"\n",
    "                f\"{self.true_val} or {self.false_val} (case-insensitive). \"\n",
    "                f\"Received {cleaned_text}.\"\n",
    "            )\n",
    "        return cleaned_text == self.true_val.upper()\n",
    "\n",
    "    @property\n",
    "    def _type(self) -> str:\n",
    "        \"\"\"\n",
    "        Return the type of the parser.\n",
    "\n",
    "        Returns:\n",
    "            str: The type of the parser.\n",
    "        \"\"\"\n",
    "        return \"boolean_output_parser\"\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "Ki6OVZpOzhoE",
    "outputId": "c381fffe-ad6a-4bea-e750-141643f8451a"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "parser = BooleanOutputParser()\n",
    "parser.invoke(\"YES\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "Iw04HNY-wxOf",
    "outputId": "121938db-dfb8-43a6-8f96-6368e6f68d12"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Triggered an exception of type: <class 'langchain_core.exceptions.OutputParserException'>\n"
     ]
    }
   ],
   "source": [
    "try:\n",
    "    parser.invoke(\"MEOW\")\n",
    "except Exception as e:\n",
    "    print(f\"Triggered an exception of type: {type(e)}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "43wfaEfpzrOh",
    "outputId": "ec65bf24-aa94-42f9-8f8e-c07ab8b76b1a"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "parser = BooleanOutputParser(true_val=\"OKAY\")\n",
    "parser.invoke(\"OKAY\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "AgAU4on6zvBh",
    "outputId": "0864b706-a6ee-4026-a582-d0908beeca59"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[True, False]"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "parser.batch([\"OKAY\", \"NO\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "gNPJtPDwz5VC",
    "outputId": "b7715e0f-d714-49f8-cc96-c0a174e0cf63"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[True, False]"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "await parser.abatch([\"OKAY\", \"NO\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "ZMGThhGn0A3e",
    "outputId": "36fe3baa-a72b-4038-df8d-165b93c424dc"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "AIMessage(content='OKAY', response_metadata={'token_usage': {'completion_tokens': 2, 'prompt_tokens': 13, 'total_tokens': 15}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-b969e2cc-df17-4c98-a890-703318c5a4c2-0')"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "llm.invoke(\"say either OKAY or NO\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "aMNLEDsE0LqJ",
    "outputId": "8149ed7c-c703-43ee-83a7-f641f1f53e93"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chain = llm | parser\n",
    "chain.invoke(\"say either OKAY or NO\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "UV2coxRD57Wh"
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "isGwPhF4GUM5"
   },
   "source": [
    "### Stripping Non-Python Text\n",
    "\n",
    "Large Language Models (LLMs) like GPT-4 are capable of generating text that seamlessly intermixes code and explanatory descriptions. While this can be incredibly useful for learning and documentation purposes, it can pose challenges when one needs to extract and execute only the code from such mixed-content outputs. To address this, we will implement a simple function designed to strip non-Python code lines from a given text string.\n",
    "\n",
    "This approach involves using regular expressions to identify and retain lines that match typical Python syntax while discarding lines that appear to be descriptive text. However, due to the inherent complexity and variability of both Python code and natural language, this method can never be perfect. It relies on heuristic patterns that may sometimes misclassify code as text or vice versa.\n",
    "\n",
    "In the next section, we will explore how another LLM can assist in the process of stripping non-Python code, potentially offering a more sophisticated and accurate solution. The following sample contains a mixture of both LLM comments and generated code.\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "kjKY36CmGaW_"
   },
   "outputs": [],
   "source": [
    "# Example usage\n",
    "mixed_text = \"\"\"\n",
    "Yes, you can estimate the value of Pi using various methods in Python. One\n",
    "common approach is the Monte Carlo method. Here's a simple example:\n",
    "\n",
    "```python\n",
    "import random\n",
    "\n",
    "def estimate_pi(num_samples):\n",
    "    inside_circle = 0\n",
    "\n",
    "    for _ in range(num_samples):\n",
    "        x = random.uniform(0, 1)\n",
    "        y = random.uniform(0, 1)\n",
    "        distance = x**2 + y**2\n",
    "\n",
    "        if distance <= 1:\n",
    "            inside_circle += 1\n",
    "\n",
    "    pi_estimate = (inside_circle / num_samples) * 4\n",
    "    return pi_estimate\n",
    "\n",
    "num_samples = 1000000\n",
    "pi_estimate = estimate_pi(num_samples)\n",
    "print(f\"Estimated value of Pi: {pi_estimate}\")\n",
    "```\n",
    "\n",
    "This code uses the Monte Carlo method to estimate Pi by generating random points\n",
    "within a unit square and checking how many fall inside a quarter circle. The\n",
    "ratio of points inside the circle to the total points, multiplied by 4, gives an\n",
    "estimate of Pi.\n",
    "\n",
    "Would you like to explore other methods or need further explanation on this\n",
    "approach?\n",
    "\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "FuJeEXjrJujm"
   },
   "source": [
    "We now provide a function to strip the non-Python text. The extract_python_code function works by utilizing regular expressions to locate and extract blocks of Python code enclosed within triple backticks. It uses the re.findall function with a pattern that matches text between python and delimiters. The re.DOTALL flag is included to ensure that the regular expression can match newline characters within the code block, allowing for multi-line code extraction. The matched code blocks are then joined into a single string, with any leading or trailing whitespace removed using the strip method. This approach effectively isolates the Python code from the surrounding mixed text, making it easy to extract and use independently."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "PR7NDFTXJc6K"
   },
   "outputs": [],
   "source": [
    "import re\n",
    "\n",
    "def extract_python_code(mixed_text):\n",
    "    code_blocks = re.findall(r'```python(.*?)```', mixed_text, re.DOTALL)\n",
    "    return \"\\n\".join(code_blocks).strip()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "WmzgoZKULQ5F"
   },
   "source": [
    "The following shows how we can use the extract_python_code to extract the Python code.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "pmDfdbgpJevN",
    "outputId": "c8db4e30-d467-447f-d3b8-6ca4cb980a57"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "import random\n",
      "\n",
      "def estimate_pi(num_samples):\n",
      "    inside_circle = 0\n",
      "\n",
      "    for _ in range(num_samples):\n",
      "        x = random.uniform(0, 1)\n",
      "        y = random.uniform(0, 1)\n",
      "        distance = x**2 + y**2\n",
      "\n",
      "        if distance <= 1:\n",
      "            inside_circle += 1\n",
      "\n",
      "    pi_estimate = (inside_circle / num_samples) * 4\n",
      "    return pi_estimate\n",
      "\n",
      "num_samples = 1000000\n",
      "pi_estimate = estimate_pi(num_samples)\n",
      "print(f\"Estimated value of Pi: {pi_estimate}\")\n"
     ]
    }
   ],
   "source": [
    "python_code = extract_python_code(mixed_text)\n",
    "print(python_code)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "56odnzkBHRDh"
   },
   "source": [
    "### Creating a Code Output Parser.\n",
    "\n",
    "We now create a custom output parser to remove any non-Python code."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "j1V60nfQ8lDa"
   },
   "outputs": [],
   "source": [
    "from langchain_core.exceptions import OutputParserException\n",
    "from langchain_core.output_parsers import BaseOutputParser\n",
    "\n",
    "class CodeOutputParser(BaseOutputParser[str]):\n",
    "    \"\"\"Custom code parser.\"\"\"\n",
    "\n",
    "    def parse(self, text):\n",
    "      return extract_python_code(text)\n",
    "\n",
    "    @property\n",
    "    def _type(self) -> str:\n",
    "        return \"CodeOutputParser\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "80uVtlDxL7rr"
   },
   "source": [
    "As demonstrated here, only the Python code is output."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 350
    },
    "id": "ILEcSf8c8_4P",
    "outputId": "74e40f5e-a348-47d4-a352-74c3f7cce207"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<style>pre { line-height: 125%; }\n",
       "td.linenos .normal { color: inherit; background-color: transparent; padding-left: 5px; padding-right: 5px; }\n",
       "span.linenos { color: inherit; background-color: transparent; padding-left: 5px; padding-right: 5px; }\n",
       "td.linenos .special { color: #000000; background-color: #ffffc0; padding-left: 5px; padding-right: 5px; }\n",
       "span.linenos.special { color: #000000; background-color: #ffffc0; padding-left: 5px; padding-right: 5px; }\n",
       ".output_html .hll { background-color: #ffffcc }\n",
       ".output_html { background: #f8f8f8; }\n",
       ".output_html .c { color: #3D7B7B; font-style: italic } /* Comment */\n",
       ".output_html .err { border: 1px solid #FF0000 } /* Error */\n",
       ".output_html .k { color: #008000; font-weight: bold } /* Keyword */\n",
       ".output_html .o { color: #666666 } /* Operator */\n",
       ".output_html .ch { color: #3D7B7B; font-style: italic } /* Comment.Hashbang */\n",
       ".output_html .cm { color: #3D7B7B; font-style: italic } /* Comment.Multiline */\n",
       ".output_html .cp { color: #9C6500 } /* Comment.Preproc */\n",
       ".output_html .cpf { color: #3D7B7B; font-style: italic } /* Comment.PreprocFile */\n",
       ".output_html .c1 { color: #3D7B7B; font-style: italic } /* Comment.Single */\n",
       ".output_html .cs { color: #3D7B7B; font-style: italic } /* Comment.Special */\n",
       ".output_html .gd { color: #A00000 } /* Generic.Deleted */\n",
       ".output_html .ge { font-style: italic } /* Generic.Emph */\n",
       ".output_html .ges { font-weight: bold; font-style: italic } /* Generic.EmphStrong */\n",
       ".output_html .gr { color: #E40000 } /* Generic.Error */\n",
       ".output_html .gh { color: #000080; font-weight: bold } /* Generic.Heading */\n",
       ".output_html .gi { color: #008400 } /* Generic.Inserted */\n",
       ".output_html .go { color: #717171 } /* Generic.Output */\n",
       ".output_html .gp { color: #000080; font-weight: bold } /* Generic.Prompt */\n",
       ".output_html .gs { font-weight: bold } /* Generic.Strong */\n",
       ".output_html .gu { color: #800080; font-weight: bold } /* Generic.Subheading */\n",
       ".output_html .gt { color: #0044DD } /* Generic.Traceback */\n",
       ".output_html .kc { color: #008000; font-weight: bold } /* Keyword.Constant */\n",
       ".output_html .kd { color: #008000; font-weight: bold } /* Keyword.Declaration */\n",
       ".output_html .kn { color: #008000; font-weight: bold } /* Keyword.Namespace */\n",
       ".output_html .kp { color: #008000 } /* Keyword.Pseudo */\n",
       ".output_html .kr { color: #008000; font-weight: bold } /* Keyword.Reserved */\n",
       ".output_html .kt { color: #B00040 } /* Keyword.Type */\n",
       ".output_html .m { color: #666666 } /* Literal.Number */\n",
       ".output_html .s { color: #BA2121 } /* Literal.String */\n",
       ".output_html .na { color: #687822 } /* Name.Attribute */\n",
       ".output_html .nb { color: #008000 } /* Name.Builtin */\n",
       ".output_html .nc { color: #0000FF; font-weight: bold } /* Name.Class */\n",
       ".output_html .no { color: #880000 } /* Name.Constant */\n",
       ".output_html .nd { color: #AA22FF } /* Name.Decorator */\n",
       ".output_html .ni { color: #717171; font-weight: bold } /* Name.Entity */\n",
       ".output_html .ne { color: #CB3F38; font-weight: bold } /* Name.Exception */\n",
       ".output_html .nf { color: #0000FF } /* Name.Function */\n",
       ".output_html .nl { color: #767600 } /* Name.Label */\n",
       ".output_html .nn { color: #0000FF; font-weight: bold } /* Name.Namespace */\n",
       ".output_html .nt { color: #008000; font-weight: bold } /* Name.Tag */\n",
       ".output_html .nv { color: #19177C } /* Name.Variable */\n",
       ".output_html .ow { color: #AA22FF; font-weight: bold } /* Operator.Word */\n",
       ".output_html .w { color: #bbbbbb } /* Text.Whitespace */\n",
       ".output_html .mb { color: #666666 } /* Literal.Number.Bin */\n",
       ".output_html .mf { color: #666666 } /* Literal.Number.Float */\n",
       ".output_html .mh { color: #666666 } /* Literal.Number.Hex */\n",
       ".output_html .mi { color: #666666 } /* Literal.Number.Integer */\n",
       ".output_html .mo { color: #666666 } /* Literal.Number.Oct */\n",
       ".output_html .sa { color: #BA2121 } /* Literal.String.Affix */\n",
       ".output_html .sb { color: #BA2121 } /* Literal.String.Backtick */\n",
       ".output_html .sc { color: #BA2121 } /* Literal.String.Char */\n",
       ".output_html .dl { color: #BA2121 } /* Literal.String.Delimiter */\n",
       ".output_html .sd { color: #BA2121; font-style: italic } /* Literal.String.Doc */\n",
       ".output_html .s2 { color: #BA2121 } /* Literal.String.Double */\n",
       ".output_html .se { color: #AA5D1F; font-weight: bold } /* Literal.String.Escape */\n",
       ".output_html .sh { color: #BA2121 } /* Literal.String.Heredoc */\n",
       ".output_html .si { color: #A45A77; font-weight: bold } /* Literal.String.Interpol */\n",
       ".output_html .sx { color: #008000 } /* Literal.String.Other */\n",
       ".output_html .sr { color: #A45A77 } /* Literal.String.Regex */\n",
       ".output_html .s1 { color: #BA2121 } /* Literal.String.Single */\n",
       ".output_html .ss { color: #19177C } /* Literal.String.Symbol */\n",
       ".output_html .bp { color: #008000 } /* Name.Builtin.Pseudo */\n",
       ".output_html .fm { color: #0000FF } /* Name.Function.Magic */\n",
       ".output_html .vc { color: #19177C } /* Name.Variable.Class */\n",
       ".output_html .vg { color: #19177C } /* Name.Variable.Global */\n",
       ".output_html .vi { color: #19177C } /* Name.Variable.Instance */\n",
       ".output_html .vm { color: #19177C } /* Name.Variable.Magic */\n",
       ".output_html .il { color: #666666 } /* Literal.Number.Integer.Long */</style><div class=\"highlight\"><pre><span></span><span class=\"kn\">import</span> <span class=\"nn\">random</span>\n",
       "\n",
       "<span class=\"k\">def</span> <span class=\"nf\">estimate_pi</span><span class=\"p\">(</span><span class=\"n\">num_points</span><span class=\"p\">):</span>\n",
       "    <span class=\"n\">inside_circle</span> <span class=\"o\">=</span> <span class=\"mi\">0</span>\n",
       "    <span class=\"n\">total_points</span> <span class=\"o\">=</span> <span class=\"n\">num_points</span>\n",
       "\n",
       "    <span class=\"k\">for</span> <span class=\"n\">_</span> <span class=\"ow\">in</span> <span class=\"nb\">range</span><span class=\"p\">(</span><span class=\"n\">num_points</span><span class=\"p\">):</span>\n",
       "        <span class=\"n\">x</span> <span class=\"o\">=</span> <span class=\"n\">random</span><span class=\"o\">.</span><span class=\"n\">uniform</span><span class=\"p\">(</span><span class=\"mi\">0</span><span class=\"p\">,</span> <span class=\"mi\">1</span><span class=\"p\">)</span>\n",
       "        <span class=\"n\">y</span> <span class=\"o\">=</span> <span class=\"n\">random</span><span class=\"o\">.</span><span class=\"n\">uniform</span><span class=\"p\">(</span><span class=\"mi\">0</span><span class=\"p\">,</span> <span class=\"mi\">1</span><span class=\"p\">)</span>\n",
       "\n",
       "        <span class=\"k\">if</span> <span class=\"n\">x</span><span class=\"o\">**</span><span class=\"mi\">2</span> <span class=\"o\">+</span> <span class=\"n\">y</span><span class=\"o\">**</span><span class=\"mi\">2</span> <span class=\"o\">&lt;=</span> <span class=\"mi\">1</span><span class=\"p\">:</span>\n",
       "            <span class=\"n\">inside_circle</span> <span class=\"o\">+=</span> <span class=\"mi\">1</span>\n",
       "\n",
       "    <span class=\"n\">pi_estimate</span> <span class=\"o\">=</span> <span class=\"mi\">4</span> <span class=\"o\">*</span> <span class=\"n\">inside_circle</span> <span class=\"o\">/</span> <span class=\"n\">total_points</span>\n",
       "    <span class=\"k\">return</span> <span class=\"n\">pi_estimate</span>\n",
       "\n",
       "<span class=\"n\">num_points</span> <span class=\"o\">=</span> <span class=\"mi\">1000000</span>\n",
       "<span class=\"n\">pi_estimate</span> <span class=\"o\">=</span> <span class=\"n\">estimate_pi</span><span class=\"p\">(</span><span class=\"n\">num_points</span><span class=\"p\">)</span>\n",
       "<span class=\"nb\">print</span><span class=\"p\">(</span><span class=\"sa\">f</span><span class=\"s2\">&quot;Estimated value of Pi using </span><span class=\"si\">{</span><span class=\"n\">num_points</span><span class=\"si\">}</span><span class=\"s2\"> points: </span><span class=\"si\">{</span><span class=\"n\">pi_estimate</span><span class=\"si\">}</span><span class=\"s2\">&quot;</span><span class=\"p\">)</span>\n",
       "</pre></div>\n"
      ],
      "text/latex": [
       "\\begin{Verbatim}[commandchars=\\\\\\{\\}]\n",
       "\\PY{k+kn}{import} \\PY{n+nn}{random}\n",
       "\n",
       "\\PY{k}{def} \\PY{n+nf}{estimate\\PYZus{}pi}\\PY{p}{(}\\PY{n}{num\\PYZus{}points}\\PY{p}{)}\\PY{p}{:}\n",
       "    \\PY{n}{inside\\PYZus{}circle} \\PY{o}{=} \\PY{l+m+mi}{0}\n",
       "    \\PY{n}{total\\PYZus{}points} \\PY{o}{=} \\PY{n}{num\\PYZus{}points}\n",
       "\n",
       "    \\PY{k}{for} \\PY{n}{\\PYZus{}} \\PY{o+ow}{in} \\PY{n+nb}{range}\\PY{p}{(}\\PY{n}{num\\PYZus{}points}\\PY{p}{)}\\PY{p}{:}\n",
       "        \\PY{n}{x} \\PY{o}{=} \\PY{n}{random}\\PY{o}{.}\\PY{n}{uniform}\\PY{p}{(}\\PY{l+m+mi}{0}\\PY{p}{,} \\PY{l+m+mi}{1}\\PY{p}{)}\n",
       "        \\PY{n}{y} \\PY{o}{=} \\PY{n}{random}\\PY{o}{.}\\PY{n}{uniform}\\PY{p}{(}\\PY{l+m+mi}{0}\\PY{p}{,} \\PY{l+m+mi}{1}\\PY{p}{)}\n",
       "\n",
       "        \\PY{k}{if} \\PY{n}{x}\\PY{o}{*}\\PY{o}{*}\\PY{l+m+mi}{2} \\PY{o}{+} \\PY{n}{y}\\PY{o}{*}\\PY{o}{*}\\PY{l+m+mi}{2} \\PY{o}{\\PYZlt{}}\\PY{o}{=} \\PY{l+m+mi}{1}\\PY{p}{:}\n",
       "            \\PY{n}{inside\\PYZus{}circle} \\PY{o}{+}\\PY{o}{=} \\PY{l+m+mi}{1}\n",
       "\n",
       "    \\PY{n}{pi\\PYZus{}estimate} \\PY{o}{=} \\PY{l+m+mi}{4} \\PY{o}{*} \\PY{n}{inside\\PYZus{}circle} \\PY{o}{/} \\PY{n}{total\\PYZus{}points}\n",
       "    \\PY{k}{return} \\PY{n}{pi\\PYZus{}estimate}\n",
       "\n",
       "\\PY{n}{num\\PYZus{}points} \\PY{o}{=} \\PY{l+m+mi}{1000000}\n",
       "\\PY{n}{pi\\PYZus{}estimate} \\PY{o}{=} \\PY{n}{estimate\\PYZus{}pi}\\PY{p}{(}\\PY{n}{num\\PYZus{}points}\\PY{p}{)}\n",
       "\\PY{n+nb}{print}\\PY{p}{(}\\PY{l+s+sa}{f}\\PY{l+s+s2}{\\PYZdq{}}\\PY{l+s+s2}{Estimated value of Pi using }\\PY{l+s+si}{\\PYZob{}}\\PY{n}{num\\PYZus{}points}\\PY{l+s+si}{\\PYZcb{}}\\PY{l+s+s2}{ points: }\\PY{l+s+si}{\\PYZob{}}\\PY{n}{pi\\PYZus{}estimate}\\PY{l+s+si}{\\PYZcb{}}\\PY{l+s+s2}{\\PYZdq{}}\\PY{p}{)}\n",
       "\\end{Verbatim}\n"
      ],
      "text/plain": [
       "import random\n",
       "\n",
       "def estimate_pi(num_points):\n",
       "    inside_circle = 0\n",
       "    total_points = num_points\n",
       "\n",
       "    for _ in range(num_points):\n",
       "        x = random.uniform(0, 1)\n",
       "        y = random.uniform(0, 1)\n",
       "\n",
       "        if x**2 + y**2 <= 1:\n",
       "            inside_circle += 1\n",
       "\n",
       "    pi_estimate = 4 * inside_circle / total_points\n",
       "    return pi_estimate\n",
       "\n",
       "num_points = 1000000\n",
       "pi_estimate = estimate_pi(num_points)\n",
       "print(f\"Estimated value of Pi using {num_points} points: {pi_estimate}\")"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from IPython.display import Code, display\n",
    "\n",
    "parser = CodeOutputParser()\n",
    "chain = llm | parser\n",
    "result = chain.invoke(\"Can I create Python code to estimate the value of Pi.\")\n",
    "display(Code(result, language='python'))"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "colab": {
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3.11 (genai)",
   "language": "python",
   "name": "genai"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.8"
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
